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This final report is a summary of the full projeotj which is pre- 
sented elsewhere as a doctoral dissertations 

Suydam^ Marilyn "An Evaluation of Journal-Published Research 
Reports on Elementary School Mathematics^ 1900-1965^ " 
Unpublished doctoral dissertation^ The Pennsylvania State 
University s 196? o 
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lo INTRODUCTION 



Bac'.cground 

Since the beginning of this century, tli.e field of educational research 
has. become increasingly important,, Educational research in this context is 
considered to include those educational iriV'-iStigations with some degree of 
scientific procedure and/or control, involving data collection for a speci- 
fic purpose. This research is generally related to a hypothesis about 
some aspect of the role of learning in the curriculum , 

The realization that the controlled experiment is a feasible techni- 
que for exploring many of the problems and issues which face educators 
caused overwhelming optimismo This hope of a panacea which would resolve 
all difficulties once and for all led to disillusionment , However, the 
concept remained that research should help to point the way toward cer- 
tain decisions, even if many aspects of the educative process are not 
readily accessible to its tactics o 

Never has there been the emphasis on research that has been develop- 
ing during the past decade and is prevalent today o The need for using the 
results of research to give direction to the teaching-learning process has 
been intensified « The development of curriculiam reform movements such as 
that of "modern mathematics** has further accentuated this need within the 
subject area of elementary school mathematics , Decisions about curricu- 
lum innovations must be related to knowledge about curriculum content and 
methods, A source of such knowledge and a foundation for decisions is 



research. 
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Need for the Study 

Research of the present and the future must be based on or indicate 
consideration of research of the pas to Implications for needed research 
or connotations which could lead to creative development are a part of 
almost all studies which have been dcne^ either overtly or intrinsically <, 

One of the difficulties which anj^ researcher faces is locating those 
studies which will be of most use to hlm„ If a researcher is interested 
in elementary school mathematics g his search of the literature reveals 
that there has been no single source of information on previous research 
on the subject. Instead there are various types of lists, no one of 
which is complete and current,, The task of obtaining this information is 
even more difficult for the teacher who is interested in utilizing the 
results of research o It is necessary to synthesize the data which have 
accumulated 0 

More than a compilation is needed 5 , however. Scrutiny of the litera- 
ture reveals that there have been many complaints about the deficiencies 
of educational research,, For example, a sample of comments may be con- 
sidered: 



Unfortunately, ue are still using research methods that 
are inadequate for the solution of the problems we face. . . . 
Much of the research conducted in education is faulty. Many 
studies contain flav/s that automatically make them null and 
void from the standpoint of applicationo These errors cover 
all aspects of research, « „ , (Mouly, 1963, 395 ) 

Little of any value can be derived from the tons of research 
that have been conducted, and the majority of the studies are 
unreliable, trivial, and unworthy of serious consideration, 
much less application. (Tate, 1950, 11) 

• . . much that is called research cannot be considered such 
when gauged by scientific standards. (Fehr, 1950, 11) 
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It should be apparent to the reader that few of the studies 
which have been reported in this review offer evidence which 
can be accepted without considerable reservation,. Many of 
them are faulty^ either in design or in interpretation ^ or in 

/ T-.T -in/./. /on\ 

uOuiIo \UUilU&UU^ 

Since research efforts vary widely in quality, the question of how 
much confidence can be placed in the findings of a study is one of con- 
siderable importanceo Because of thisj a comprehensive compilation must 
contain some indication of the value of each study o This attempt to eval- 
uate is a significant characteristic of the present study o 
Review of Related Literature 

The literature was searched to find the answers to three questions: 

(1) Is there a compilation of the research on elementary school mathema- 
tics? (2) Is there an evaluative compilation of the research on elemen- 
tary school mathematics? (3) Is there an instrument for evaluating 
res earch? 

Previous Compilations of the Research on Elementary School Mathematics , 
There are many compilations, no one of which is truly comprehensive « The 
existing ones may be grouped into three classifications: reviews, topical 

summaries, and bibliographical lis tings „ 

Among the reviews are twelve published by the Review of Educational 
Research , These are descriptive accounts, primarily concerned with report- 
ir.g significant findings, conclusions, and implications of the research 
within a specified period of time. The Cyclopedia of Education and the 
editions of the Encyclopedia of Educational Research contain summaries of 
the most significant conclusions and implications of research over the 
years, within the framework of usefulness to teaching, with criteria deter- 
mined by the reviewer’s philosophy. Glennon and Hunnicutt (1952, 1958), 
Morton (1953) , and Spitzer (1962) have discussed the implications of the 

er|c 
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research in mathemarics for the classroom teacher in pamphlets which enum- 
erate applications but do not directly quote the researcho All of the 
reviews are summarized on Table I in Appendix A<, 

In the group of topical summaries are those studies which review the 
research on a particular topic o Several are particularly good ‘-examples 
of carefully done research of this typeo Brownell and others (1941) cri- 
tically analyzed the research for the findings applicable to the teaching 
of arithmetic in grades 1 and 2o Johnson (1944) compared and noted weak- 
nesses of research on problem solving « The research from 1911 to 1940 on 
methods of teaching arithmetic was compared by Knipp (1944) « A list of 
these and similar studies is presented on Table II in Appendix A« 

Cited as bibliographical are those studies which have as a primary 
purpose the listing of references o In some cases these are complete for 
a specified period of time, while in other cases they are selected by 
criteria not always specified by the reviewer „ 

Buswell and Judd (1925)* recognizing the need for synthesis, compiled 
^ list of 320 titles xvfhich included both research and critical discussions. 
Buswell continued this practice for the next seven years g but from 1933 
changed his emphasis, presenting only "selected referenv^'es" without attempt- 
ing completeness o Hartung continued this practice from 1943 through 1964o 
Monroe and Engelhart (1931) used the lists developed by Buswell and 
Judd (1925) and Buswell (1926-1930) as a primary source of titles for 
their summary, which includes only research „ 

Stretch (1941), Van Engen (1950), Gibb (1954), Hunnicutt and Iverson 
(1958), and Schaaf (1960) presented selected references, with the basis 
for selection generally one of pertinence, but not precisely defined. 
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Weaver (1957, 1958*66) probably presented the most complete lists of 
iiw do iiuj.iuauxve ciiiu cApeT iHiencax researcHo''’ His sources 

were journals and other publications ^ as well as Dissertation Abs tracts o 
He attempted to secure an exhaustive listing, but only for the years since 
1950 o Some significant articles,** in addition to research reports, were, 
citedo Annotations have been included on several of these lists, but 
categorization was not a consistent feature® 

The compilation of doctoral dissertations by Summers and others (196lj 
1963, 1965) would seem to be complete, but is not categorized in all cases c 
Brown and various co-authors (1953, 1954, 1955, 1958, 1960, 1963, 1965) 
made little attempt to be exhaustive, but listed current research at the 
doctoral level and that which was supported by government funding® Burns 
and Dessart (1965, 1966) summarized investigations for a limited period® 
Table III in Appendix A summarizes the most pertinent bibliographi- 
cal listings® 



Previous Evaluative Compilations of the Research on Elementary School 
Mathematics ® Most of the compilations are evaluative only by selection or 
omission, primarily by the criterior appropriateness to the specified 
topic or period of time® In only ten cases are critical comments of some 
type made, Bernstein (1959) stated evaluative reactions to some of the 
research on remedial arithmetic® Brownell and others (1941) included cri- 
tical comments in the course of their discussion on primary arithmetic* 
Criticism on design and findings was made by Hightower (1954)® Writing 
in 1914, Howell noted that some experiments are open to question® Johnson 
(1944) pointed out weaknesses in some of the studies on problem solving, 
as did Weaver (1956) in his critical review of research on compound sub- 
traction® Buswell and Judd (1925), Schaaf (1960), and Weaver (1957, 
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1958—1966) all noted specific criteria for acceptance or rejectiono Monroe 
and Engelhart (1931) used a criterion to select experimental and research 

oc rvT'TmavJ 1 it -Pv/im < — t> _i-« . _ -- 

J.4.V/1U uiic xxouo (.umjjxieu Dy uusweir ana Judd (iyz5) and 

Buswell (1926-1930) o Four other criteria were utilized to evaluate this 



research? control of variables, accuracy and validity of the measures used, 
and justification for the generalization. 

Previous Instruments for Evaluating Educational Research , Six instru- 
ments for evaluating research, all of which have been tested for reliabil- 
ity, have been found. For three ot chese (Cook, 1964j Hodges, 1966 j 
Wandt, 1965), no reliability data are available. For one (Shaycroft and 
Altman, 1955) the reliability is so low (ol86) that usefulness of the 
instrument is questionable. The remaining two have been found to be help- 
ful in evaluating educational research, 

Johnson (1957) attempted to "devise and evaluate a technique which 
would facilitate the acquisition of skill in summarizing and evaluating 
scientific research articles in education," The technique and the report 
and evaluation sheet were developed during a graduate course on educa- 
tional research methods. The interrelationship between student evalua- 
tions was ,76 (significant beyond ,01), using a "random split corrected 
for attenuation," The relationship between student evaluation and expert 
evaluation was ,78 (significant beyond ,01), while the interrelationship 
between expert evaluations was ,79 (significant beyond ,01), 

Gephart (1964) attempted to "determine the interrater reliability 
of a research evaluation instrument , „ , structured , , , through the 
identification of action verbs and the objects of these action verbs 
used in describing the research process," 



The interrater reliability 
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for overall evaluation ratings was o76 for rankings and o74 for ratings 
(significant beyond oOOl), using Kendall’s Wo 

Iii addition to these instruments j many suggestions have been uiade 
on ways to do better research^ Brooks (1923), Brownell (1947), 

Farquhar and Krumboltz (1959), Fox (1958), Gates (1949), Good (1929, 
1963), Kerlinger (1964), McCall (1923), MacDonald (1966), Monroe and 
Engelhart (1931), Mouly (1963), Perdew (1950), Scates and Hoban (1937), 
Symonds (1956), Travers (1964), Tyler (1958), Van Dalen (1958), Wolfle 
(1949), the Encyclopedia of Educational Research, and the Bureaus of 
Educational Research at the University of Minnesota and the Ohio State 
University have all suggested criteria for the evaluation of educational 
research, either in the form of a list or as a specific suggestiono 

No evaluative instrument has been applied to the research in ele- 
mentary school mathematics as far as can be ascertained from the liter- 
ature o 

Description of the Study 

A list of all reports of research which relate to the teaching of 
mathematics in the elementary school (kindergarten through grade eight) 
and which have been printed in journals published in the United States 
during the years from 1900 through 1965 has been compiledo Each study 
was categorized by mathematical topic and type of study o The research 
which is experimental was also categorized by design paradigmo Speci- 
fic information on statistical procedure, variables controlled, sampling 
procedures and size, type of test, grade level, and duration were 
included whenever applicableo Major conclusions which appear consis- 
tent with the data of each study were notedg 

ERIC 
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An Instrument for Evaluating Experimental Research Reports was 
developed and tested for reliability o The experimental research was 



evaluated ’witli tliis insti.uiueiit ^ and aacli s uudy was asslgnad to a com— 



posite evaluative categoryo 

In addition, a list of dissertations which have been completed 
has been compiled, in order to increase the comprehensiveness of the 
comp i la t ion o 

Pertinent data have been summarized and major conclusions per- 
taining to mathematical and educational research methodology are 
reported o Limitations of the study are noted in the dissertationo^ 



Suydam, Marilyn No, "An Evaluation of Journal-Published Research 
Reports on Elementary School Mathematics, 1900-1965," pp» 5-9, 
Unpublished doctoral dissertation, The Pennsylvania State Univer 
sity, 1967o 




II. METHOD 



Tlvi G G‘t*ii/1‘tr -1 1 -F-i-tro /T ^ 

— w w V V/ ^ V -K^VA O. .A. V O UCa.gOO • V y 

izing, (3) developing an instrument, (4) 
marizing the data on the reports of research 
maticSo 
Compiling 



couipiliug, (2) categor- 
evaluating, and (5) sum- 
on elementary school mathe- 



Reports of research on elementary school mathematics printed from 
1900 through 1965 in journals published in the United States have been 
collected o To identify the reports, several procedures were used to 
ensure as complete a listing as possible. The journals in which over 
eighty per cent of the research reports for the post-1930 period 
appeared were checked on a page-by-page basis. For other post-1930 
journals and for all journals in the pre— 1930 period, only references 
cited by others were included. All issues of the Education Index from 
the first issue in 1929 through 1966 were searched and each reported 
ai^ticle which seemed to pertain to research was individually checked. 
Each report of research was scrutinized for any references made to 
previous research. In addition, collections of research were examined, 

A list of the dissertations on elementary school mathematics 
which were completed from 1900 through 1965 was compiled to extend 
the compilation. Dissertation Abstracts and previous investigations 
provided the major sources for this list. 

Categorizing 

Each report of journal— published research was categorized by 
mathematical topic (see Appendix D) and type of study (see Appendix E) , 
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Experimental research was categorized by design paradigm, statistical 
procedure, sampling procedure and size, type of test, grade level, and 
durationo When such information was available in a report for another 
type of research, it was includedo The major conclusions or findings 
which seemed supported by the data have been noted for all types of 
studies, and the independent (I) and dependent (D) variables have 
been noted for experimental research, and for those types of action 
research where it was possible to do sOo 
Developing an Ins trument 

While there is much in the literature on the need to evaluate 

research, there is comparatively little specific help. Only two 

instruments were readily available which were considered for use in 

evaluating the research in arithmetic o However, neither of these 

proved entirely suitable for the purpose o More information seemed 

to be needed to support the items on the list developed by Johnson 

2 

(1957), Gephart (1964) supplied additional information, but then 
sacrificed the careful time-consuming rating of each sub-item to a 

3 

purely subjective final rating. 

There is a need for a comparatively simple instrument which pro- 
vides information concerning major factors and problems in research. 
Thus, the more carefully controlled research can be separated from 
that which was less well done. It is difficult, however, to attempt 
to distinguish weaknesses in the research process from those of the 
reporting process. Therefore, it is more precise to consider the 



2,3 



See pages 6-7 for additional comments on these instruments. 
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result as an evaluation of a report; however, its correlation with 
that of the research on which it is based should be higho 

The first stage in the formulation of the Instrument for 
Evaluating Experimental Research Reports was to compile the lists of 
suggestions proposed by writers in the field of educational research 
The following topics were consistently listeds 

(1) Importance or significance of the problem 

(2) Definition of the problem 

(3) Design of the study 

(4) Control of variables 

(5) Sampling procedures 

(6) Use of instruments 

(7) Analysis of data 

(8) Interpretation of results 

(9) Reporting of the research o 

Each of the points was stated in question form, to make it possi- 
ble to consider an evaluation. The nine questions were checked for com- 
pleteness of content, and were subjected to trial use in evaluating 
several reports. It was evident that the instrument could be used more 
effectively if some direction could be given in answering the nine 
questions. Using "key points" with adjectives to give a range for each 
made it reasonably certain that each rater would be focusing upon the 
same aspect, 

4 ^ — 

For a complete list of these, see page 7, 
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The instriiment was tested for interrater reliability; the reports 
of this testing are included in Chapter IIIo The instrument itself 
and directions for its use are presented in Appendix Fo 
Evaluating 

The compiled research published from 1900 through 1965 which was 
categorized as "experimental” was evaluated with the Instrument for 
Evaluating Experimental Research Reports o The restriction to experi- 
mental research was necessary because of the design of the instrument. 

A quantitative score was derived from evaluation with this instru- 
ment. The sum of the numerical scores assigned to each question may be 
considered as a basis for some degree of comparison. 

As a final index to the research, each of the research studies 
was assigned to a composite evaluative category. This index is included 
to aid the reader in locating those studies which may best meet his pur- 
poses. Symbols were chosen to represents 

EPD - Purpose, type of study, design, and statistical proced- 
ures seem sound and pertinent to curriculum today under 
the stated definition of experimental research. 

ED - Type of study, design, and statistical procedures seem 
sound and pertinent to curriculum today under the stated 
definition of experimental research, but the purpose 
does not seem pertinent. 

EP - Purpose seems pertinent to curriculum today, but type of 
study, design, and/or statistical procedures do not seem 
sound and/or accurate today under the stated definition 



of experimental research. 
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NE - Study is not considered experimental research under the 
stated definition« 

The results of this study are summarized through the presentation 
of psttinent datao The total number of uses within each of the cate- 
gories is depicted on tables o Major conclusions are cited, and the 
major repetitive errors in research methodology are indicatedo Impli- 
cations for further research are noted « 



II I o RESULTS 



Investigations of the Reliability £f tl^ Instrument 

The Instrument for Evaluating Experimental Research Reports which 
was developed as one aspect of this study was tested in two separate 
investigations for the degree of reliability or interrater agreement 
which could be expected in its use^ 

The first study was on a smaller scale than the second, and was 
conducted prior to any evaluation of the research reports o The popu- 
lation of studies was limited by the extent to which the compilation 
of reports had proceeded* Its purpose was to ascertain the level of 

agreement among the writer and two other raters with a comparable back- 
5 

ground* Provision was made for testing the effect of bias in reading 
research reports. Since the period of training for this study was 
limited, the measure of reliability secured may be considered to depict 
a base level, rather than one inflated by the results of training which 
almost always would lead to increased agreement* 

In the second study, there was no training provided beyond the 
directions stated on the instrument, for the same reason as in the 
first study. Moreover, this provides a measure of the usefulness of 
the instrument ».o diverse readers of a type who might plausibly use 
the instrument in a realistic situation without extensive training 
in its use, 

^These two raters, Cecil R* Trueblood and Lynn A* Watson, aided the 
writer in evaluating the research for the years 1955 through 
1965 0 



15 



First Study . The procedures for the first study of interrater 
agreement were: 

lo The population of reports of experimental research which 
have been published in The Arithmetic Teacher from 1954 
through 1965 (volumes 1 through 12) was identified « 

2o A sample of ten of these reports was randomly selected 
for reproduct ion 0 

3o The name of the author and the year of publication were 
deleted on five of the reproduced reports ^ selected at 
random 0 

4, Three doctoral candidates in elementary education were 
identified as raters o 

5o The raters independently evaluated the ten selected 

reports of experimental research with the proposed instru- 
ment » 

The interrater agreement on overall ratings was determined, using 
an analysis of variance procedure o The results for the first study 
of reliability are presented on Table IV in Appendix Bo 

To determine the proper terms to use in the reliability formula, 
expected mean squares were determined « Pooling was necessary to 
secure error terms o When it was assumed that A was a fixed factor, and 
B and C were random, the only significant effect was that for between 
Articles (B within A)o The interpretation of generalizability is thus 
extended to include all judges of the same type, though with recogni- 
tion of the fact that power is lacking due to the small number of judges 
involved in the study o The F ratios are presented on Table V in Appen- 




dix Bo 
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Since the masking treatment (A) and Between Judges (C) effects 
were non-significant, the proper terms to use in the AOV formula to 

aK "f A o or> 4* o 

w a. a. w. wo- j.c«u\,«a. ci.^1. oduoii u ax, ^ • 



MS 



r = 1 



pooled error term 



MS, 



With data from the present study inserted, the result is a coefficient 
of o91 for interrater agreement o 

19o5 

^ “ 1 - = o91 

218o0 

This coefficient estimates the correlation between the combined ratings 
of the three judges used in the study and the combined rating of 
another hypothetical random sample of judges taken from the same popula- 
tion and rating the same ten articles. 

The measures of reliability for two previously cited instruments 
were obtained by different meanso Johnson (1957), who found reliabil- 
ity coefficients of ,75 for student evaluations and ,79 for ratings by 
experts, used a "random split corrected for attenuation," Gephart 
(1964) secured an interrater reliability of ,764 for rankings and .749 
for ratings using Kendall ®s W, The use of the AOV reliability formula 
with the data from the present study is somewhat comparable to the 
statistical treatments which were used in previous studies. The coeffi- 
cient of ,91 compares favorably with these other estimates of observer 
agreement , 

However, as Ebel (1951, 408) states, this formula is sometimes 
inappropriate when inter rater agreements are in question? 
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If decisions are based upon average ratings, it of course 
follows that the reliability with which one should be con- 
cerned is the reliability of those averages o However, if 
the raters ordinarily work individually, and if multiple 
scores for the same theme or student are only available in 
experimental situations, then the reliability of individual 
ratings is the appropriate measure o 

He suggests the use of an intraclass formula such as that presented 

by Snedecor for the reliability of individual ratings; 

MS, - MS 

r = b r 

MS, + (k-1) MS 

b r 

With data from this study inserted this results in the following: 

218,0 - 19o5 
218o0 + 2 (19o5) 

Thus the coefficient of reliability which provides a measure of the 
consistency probable with a single rater using the Instrument for 
Evaluating Experimental Research Reports was found to be ,77 in this 
study. This is similar to the coefficients found fo." previous instru- 
ments with less rigorous formulas. As a cross-check on the accuracy 
of this result, the interratet correlations were found: r. « = ,79, 
^2 3 ” ^1 3 “ mean of the interrater correlations is 

o78, which confirms the accuracy of the measure of intraclass relia- 
bility, It would seem that this coefficient is highly satisfactory. 
Second Study , The procedures for the second study para21eled 
those for the first study except that a more diverse population was 
considered: 

1, A population of reports of experimental research which 
have been published in journals in the United States 
from 1930 through 1965 was identified. 
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2o A stratified sample of ten of these reports was 

selected 0 Stratification was on the basis of a) jour- 
nal source, b) status of author, and c) year of 
publicationo 

3o These ten reports were reproduced o On five, selected 
at random, journal source, status of author, and year 
of publication were deleted o 

4o Twelve raters who were representative of groups most 

likely to be involved in evaluating educational research 
were identified: 

a 6 Three doctoral candidates in elementary education 
b» Three doctoral candidates in educational psychology 
Co Three professors in elementary education 
do Three professors in educational psychologyo 
5o Each of the raters independently evaluated each of the 
ten reports with the proposed instrumento 
The results of the analysis of variance for the second study are 
presented on Table VI in Appendix Bo 

Expected mean squares were determined for the condition where 
articles (BwA) and judges (EwCD) were both random, and all other fac- 
tors fixedo The F ratios for the second study are presented on Table 
VII in Appendix Bo Other effects were nonsignificant; they were pooled 
to form the error termo 

It will be noted from the table that the articles effect (BwA) 
was significant, as in the first study o Therefore generalizability 
may be considered to extend to all judges in these subsets, though 
this study provides more power o 
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Using the AOV reliability formula, the coefficient of o94 for 
interrater agreement was foundo 

ii„oa 

r = 1 - ■ ■ = ,94 

183o97 

When Snedecor’s formula is used, the resulting coefficient is o57o 

183,97 - 11,08 

r = = o57 

183,97 + (11) (11,08) 

When correlations between each pairing of the twelve raters were com- 
puted, the mean was found to be ,57, This serves as a check on the 
accuracy of the intraclass reliability. 

The degree of interrater agreement again compares favorably with 
those found for previous instruments. The reliability of individual 
ratings which is derived from Snedecor^s formula is lower, and may 
present a more realistic picture of the variability which may be ex- 
pected from the use of a rating instrument of this kind with a single 
rater. 

As a comparative analysis for interest and information, each of 
the four subsets of three raters was considered separately. The 
coefficients for interrater agreement and for individual reliability 
which were found for each subset of raters are presented on Table VIII 
in Appendix B, 

Implications . The Instrument for Evaluating Experimental Research 
Reports was found to have coefficients of interrater agreement which 
ranged from ,77 to ,94, using the analysis of variance formula. The 
particular set of judges being considered caused the range. The rea- 
sons for the range with varying groups are a matter of conjecture, 




20 



and beyond the scope of the present study „ The set of judges was appar- 
ently quite homogeneous, since there is apparently one general factor 
which is being tested, with all other factors accounting for only a 
small portion of the variance o 

The measures of reliability reported were based on studies in 
which the raters received no training o In previous investigations with 
similar scales, it has been found that training will increase the 
degree to which raters agreco 

The coefficients for one judge using the instrument ranged from 
.53 to ,78e Any one individual's perception apparently lowers the 
level of reliability which can be predicted. Anyone who uses the 
instrument should ascertain the degree of interrater agreement and/or 
the coefficient for one rater which applies to that particular situa- 
tion. 

Summarization and Analysis of Data 

y 

• 6 

A total of 799 analyses are presented in the dissertation. 

Journals , These 799 research reports were found in fifty journals 
presented in Table IX in Appendix C, Three journals published over 
half (54%) of the reports. Ten journals published 84% of the reports; 
thirteen journals, 89%, The remaining reports (11%) were published in 
37 journals. 

Years , A count of the distribution by years revealed that 2 reports 
were found for the decade 1900-1910; 36 for 1911-1920; 89 for 1921- 

1930; 167 for 1931-1940; 118 for 1941-1950; 165 for 1951-1960; and 

222 for 1961—1965, The figure for the last five-year period is 
obviously greater than for any prior ten-year period, underlining the 
emphasis being placed on research today, 

Suydam, Marilyn N,, jO£, cit , , pages 50-438, 
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Mathematical topic and type ^ studv o Table X in Appendix C 
presents the frequency by mathematical topic and the frequency by type 
of study o The number of reports of experimental research was 246, 
a figure almost equalled by the 230 reports of surveys which were 
foundo Totals for other types of studies were? descriptive, 107; 
case study, 18; action, 63; correlational, 56; and ex post facto, 
79o 

The distribution of reports gives some indication of the concern 
for various topics, as well as an indication of the fact that some 
topics lend themselves more readily to one type of research o For 
instance, readiness (b-1) is most readily ascertained through surveys, 
while case studies were most frequently used to depict individualiza- 
tion techniques, particularly for remediation (e-2) <, 

Cross-ref erencing o Cross-referencing adds more depth, for in 
many cases the topic which was cited first was selected arbitrarily <> 
The totals within each mathematical category shift somewhat as all 
references are counted. The topics under which the largest number of 
all types of research were categorized ares 



(1) 


a-5bs 


problem solving (84) 


(2) 


f-2j 


achievement evaluation (76) 


(3) 


a-3: 


planning and organizing for teaching (62) 


(4) 


d-1; 


textbooks (56) 


(5) 


e-1? 


diagnosis (55) 


(6) 


CM 

1 


remediation (52) 


(7) 


b-5? 


content to be included in grade (46) 


(8) 


f-lj 


testing (44) 



22 



(9) a-5a: 


drill and practice (43) 


(10) c-8s 


measurement (43) 


(11) c-3d: 


division of whole numbers (40) 



Design paradigm o A frequency distribution was made for the design 
paradigms which were categorized o Those more frequently noted were 2 



(1) 3o4s 


pretest-posttest, insufficient information re n (50) 


(2) lo2s 


one group pretest-postrest (25) 


(3) 3„2l2 


non-equivalent control group ^ pretest-posttest (18) 


(4) 3o22j 


non-equivalent control group, posttest only (18) 


(5) 3«19s 


post test only, own control, insufficient information 
re n (17) 


(6) 2o2s 


pretest-posttest, control group, matched, n = students 
(14) 


(7) 3ol: 


pretest-posttest, control group, matched, n = students 
when the sampling unit seems to have been classes (13) 


(8) 3o8: 


posttest only, insufficient information re n (13) 



Analysis of these types reveals a problem which is shown in several 
ways? sampling and/or the way in which a researcher reported the samp- 
ling for his experiment was a point of great variability and ambiguityo 
Of the 246 experimental studies^ 39 involved no control group, while 
another 150 involved possible sampling errors o 

Statistical procedure o Descriptive statistics are noted in almost 
2/3 of the reports o The other techniques most noted were? 



(1) 3»4: 


t-test (123) 


(2) 6„4: 


correlation (89) 


(3) 3o3: 


F-test (68) 
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(4) 


3o2s 


(5) 


3ol5s 


(6) 


3o5s 


(7) 


2o6s 


(8) 


3ol7s 


Evaluative 



analysis of variance (58) 
z-testj critical ratio (43) 
analysis of covariance (37) 

Chi square test for independence (30) 
Probable error (24) 



as a referrent for determining ultimate value of the studies in the 
opinion of the reviewer o The majority of the studies (553) were labeled 
non-experimental. Of the 246 experimental studies ^ 112 were labeled 
"EPD"^; 9 were labeled "ED”; and 125 were labeled "EP"o ThuSj, only 

14% of all studies or 46% of the experimental studies were considered 
sound and pertinent today as experimental research o 

Qualitative value o Analysis of the qualitative values which 
resulted from application of the Instrument For Evaluating Experimental 
Research Reports shows a range from 13 to 44 of a possible 9 to 45o 
Table XI in Appendix C shows the distribution by three periods of times 
1900-1929, 1930-1950, and 1951-1965 o 



Two questions, those involving control of variables and sampling, 
were rated especially low* The percentages for those which were 
assigned ratings of satisfactory or better on each question ares 

(1) How practically or theoretically significant is the 
problem? 73 o 5% 

(2) How clearly defined is the problem? 72o3% 

(3) How well does the design answer the research question? 50 o 7% 
^See pages 12 and 13 for definitions of these symbols » 
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(4) How adequately does the design control variables? 29«7% 

(5) How properly is the sample selected for the design and 
purpose of the research? 27 o 7% 

(6) How valid and reliable are the measuring instruments or 
observational techniques? 53 o 3% 

(7) How valid are the techniques of analysis of data? 44o4% 

(8) How appropriate are the interpretations and generaliza- 
tions to the data? 59 o 8% 

(9) How adequately is the research reported? 65o0% 

Piss er tations o A total of 470 dissertations on elementary school 
mathematics were found for the 65*year period o 

Analysis of Content o Only eighty reports of the 246 in the experi- 
mental category were considered satisfactory or better on total scores o 
This would seem to indicate a need to improve the reporting of research 
and possibly research procedures as wello 

When these eighty studies are considered j no possible summary can 
be made, either because there was only one study in a category or 
because the studies were aimed at diverse phases® In other cases, 
inconsistency is evidenced® Some specific help is provided for the 
classroom teacher— and this is the ultimate purpose of any research—— 
but there is no clear and well-defined pattern evidenced from resea’ * , 



SUMMARY AND IMPLICATIONS 



IV o 



Final Summary 

1® A list of all reports of research which relate to the 
teaching of mathematics in the elementary school and which h 3.V0 bs£H 
printed in journals published in the United States during the years 
from 1900 through 1965 has been compiledo A total of 799 research 
reports were found in 50 journals o 

2o Each study is categorized by mathematical topic and type 
of study* Of the totals 207 were placed primarily in the categories 
for educational objectives and instructional procedures; 63 in topi- 
cal placement; 154 in basic concepts and methods of teaching them; 

78 in materials; 131 in individual differences; 99 in evaluating pro- 
gress; and 67 were categorized as studies related to learning theory* 
The frequency by types of studies was 2 descriptive, 107; survey, 

230; case study, 18; action, 63; correlational, 56; ex pest facto, 
79; and experimental, 246* 

3* The research which is experimental is also categorized by 
design paradigm* Of the 246 experimental studies, 39 involved no 
control group; 150 involved possible sampling errors; while only 57 
seemed to be valid examples of more carefully designed experiments* 

4* Specific information on statistical procedure, variables 
controlled, sampling procedures and size, type of t€\st, grade level, 
and duration are included whenever applicable in the analysis of each 



report* 
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5o Major conclusions which appear consistent with the data 
in each study are also noted with the analysis of each report o 

6o An Instrument for Evaluating Experimental Research Reports 
was developed and tested for reliability o In one study with three 
judges, the interrater agreement was found to be ol9, while the intra- 
class reliability was o77o In a second study with twelve judges, the 
interrater reliability was found to be o94, with an intraclass relia- 
bility coefficient of o58o 

7o The experimental research is evaluated with this instru- 
ment o None of the reports was rated excellent in overall rating o 34 
of the reports were rated very good; 60, good; 84, fair; and 68, pooro 
8o Each study was assigned to a composite evaluative categoryo 
553 were non-exp erimental; 112, *'EPD-* (purpose, type of study, design, 
and statistical procedures seem sound and pertinent to curriculum 
today under the stated definition of experimental research); 9, "ED" 
(type of study, design, and statistical procedures seem sound and 
pertinent to curriculum today under the stated definition of experi- 
mental research, but the purpose does not seem pertinent); and 125 
"EP" (purpose seems pertinent to curriculum today, but type of study, 
design, and/or statistical procedures do not seem sound and/or accu- 
rate today under the stated definition of experimental research) o 
9o A list of 470 dissertations which have been completed was 
compiled and included in the appendix to increase the comprehensiveness 
of the comp ila t ion o 
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Pertin6nt data ware summarized and major conclusions per* 
taining to mathematical and educational research methodology are 

1 T G t" O /I 1 o ^ 1 *fT 

v,a.wc4*. xjf v^cA.j.i4i=u oppiicauxrxLy Lo a cneory ot instruction 

is evident. 



-Suggestions and Implications 

The first time a project of the type involved in the present 
study is attempted, there is a process of evolvement. The basic model 
or structure is revised time and again. Therefore, the following 
suggestions are made: 

1. There is a need to replicate this study, using the present 
structure as the basis. 

2. More precise definitions of such categories as design 
paradigm can be more readily developed now that a firmer concept 
exists of what is actually found in the research on elementary school 
mathematics. Through such precisely defined categories, the factor of 
perceptual differences may be more readily controlled. 

3. The statistical procedure and other categories could be 
checked for accuracy. 

4. More extensive cross-referencing could be done. 

5. The research could be re-evaluated to secure a confirmation 
of the validity of the present evaluation. 

Continued extension to add reports of research on elementary 
school mathematics is necessary. This would include: 

1. Reports published in American journals for the period 1900 



through 1965 which were not discovered by the present reviewer need 
to be included in the compilation. 

2. Other sources of rfisparcb rP‘■^n■r^c rtood Ko 

1 — w w o\-cn. cLiiCX 

compilations developed. 

3. Confirmation of the accuracy of the list of dissertations 
is needed. 

4. The compilation and the evaluation of the reports need to 
be extended beyond 1965. 

Synthesis of the data in a form which is meaningful to teachers 
should be done. In particular, an analysis of what the research 
should and can mean to classroom teachers is of vital importance. 

It was noted that two major deficiencies are evident in re*- 
search reports? (a) the lack of sufficient information on sampling 
and (b) the lack of firm control of variables. These may be merely 
problems of reporting. They may also be actual problems of the re- 
search process. Thus it seems that? 

1. The improvement of research possibly depends on increasing 
the researcher's awareness of the need to consider these two points 
especially carefully. 

2. The evaluation with the other seven points on the instru- 
ment would seem to show that more careful planning and reporting of 
research projects are needed. 

3. The Instrument for Evaluating Experimental Research Reports 
may serve as a guide to planning as well as its use in evaluating the 
finished product. 
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4o There is a need to develop similar instruments to evaluate 
types of research other than experimental. It is the opinion of the 
writer that sampling was a problem in most types of researcho 

5o Researchers need to consider the possibility of planning 
experimental research rather than^ as has happened in the past, re- 
sorting to ex post facto studies ^ 

6o Careful and precise planning of research is vital o Equally 
careful and precise reporting would be helpful. 

More research needs to be done on many topics. The topics of 
the 799 studies seemed to be almost randomly distributed among cate- 
gories , Researchers need to consider the points on which research is 
most needed. The possibility of using research as a means of develop- 
ing a theory of instruction needs to be carefully and thoughtfully 
pursued. 
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i APPENDIX Be Tables of Data on Studies of Reliability of the 

Instrument for Evaluating Experimental Research Reports 



TABLE IV 

ANALYSIS OF VARIANCE: SUrmRY OF 

DATA FOR FIRST STUDY 



Source 


ss 




MS 


Between articles 


1961o6 


9 


218o0 


Masking (A) 


128 0 2 


1 


128o2 


Between articles 
within A (BwA) 


1833o4 


8 


229e2 


Within articles 


390o3 


20 


19o5 


Between judges (C) 


40e5 


2 


20o3 


A X C 


65o2 


2 


32o6 


C X BwA 


284o6 


16 


17o8 


Total 


2351o9 
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TABLE V 

F RATIOS FOR THE FIRST STUDY 



Source 


Error term 


F 


2 


A (masking: fixed) 


BwA 


o56 




BwA (articles: random) 


pooled, C, AC, BwA 


llo75 


o 

O 

H* 


C (judges: random) 


BC 


lol4 




A X C 
C X BwA 


BC 

(no error term) 


lo83 





52 



TABLE VI 

ANALYSIS OF VARIANCE? SUMARY OF 
DATA FOR SECOND STUDY 



Source 

Between articles 

Masking (A) 

Between artiicles 
within A (BwA) 

Between judges 

Experience (C) 

Field (D) 

Experience x 
Field (C X D) 

Between judges 

(judges within CD) 
(EwCD) 

Interaction? 

articles x judges 

A X C 

A X D 






A X C X D 
C X BwA 
D X BwA 
CD X BwA 
7t EWCD 
EwCD X BwA 



^ pooled 
residual 



ss 

1581.98 

110.21 

1471.77 

371.49 

69,01 

130.21 

1.00 



171.27 

1040. 12 

7.00 

18.40 

14.03 



1000.69 



2993.59 



61 

9 



8 



11 



1 

1 



8 



99 



1 

1 

1 



96 



119 



175.78 

110.21 

183.97 

33.77 

69.01 

130.21 

1.00 



21.41 

10.51 

7.00 

18.40 

14.03 



10.42 



<5^ 



Total 



TABLE VII 



f" RATIOS FOR THE SECOND STUDY 



Source Error term 


F 


P 


A (masking 2 fixed) 


B 


c 60 




BwA (articles s random) 


e* 


17o66 


P < oOl 


C (experiences fixed) 


E 


3o22 




D (fields fixed) 


E 


6c08 




EwCD (judges within experience 
by fields random) 


e* 


2„05 




C X D 


E 


o05 




A X C 


e* 


o67 




A X D 


e* 


1.11 




A X C X D 


CD 


lo35 





Spooled residual - 10o^2 
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TABLE VIII 

SUMMARY OF RELIABILITY COEFFICIENTS 
FOR THE SECOND STUDY 



Judges N 

Elementary education 

faculty 3 

Educational psychology 

faculty 3 

Elementary education 

doctoral candidates 3 

Educational psychology 
doctoral candidates 3 

Total set of raters 12 



Interrater 

agreement 

(AOV) 


Individual 

reliability 

(Snedecor) 


oil 


o53 


.85 


o65 


o92 


o78 


o78 


o54 


o94 


o57 



er|c 



55 



APPENDIX C„ 



T?)b.le= rr Sutnnia-^ies Rg: firing froiri Categcrizacion 

TABLE IX 

FREQUENCY OF REPORTS BY JOURNAL SOURCE 



American Education 



Atti 



or* n r> a-r> 



1 1 O *1 O 1 

i-l Jk>V^l.&C4. JU 



O ^ n / 

JLVX- OX 



, ^ >V» ^ Vk 
■CLX. V^II 



O KJKJ.I. ilex i. 



American Journal of Mental Deficiency 
American Journal of Psychology 
Arithmetic Teacher 



1 

O 

4 

2 

158 



Baltimore Bulletin of Education 



1 



California Journal of Educational Research 14 
Catholic Education Review 1 
Chicago Schools Journal 4 
Child Development 23 
Childhood Education 5 



Education 8 
Educational Administration and Supervision 9 
Educational Method (Journal of Educational 

Method) 16 
Educational Outlook 1 
Educational Research Bulletin 27 
Elementary English Review 2 
Elementary School Journal (Elementary 

School Teacher) 132 

Harvard Educational Review 2 
High Points 1 



Instructor 



1 



Journal of Applied Psychology i 

Journal of Education 2 

Journal of Educational Psychology 57 

Journal of Educational Research 138 

Journal of Exceptional Children 2 

Journal of Experimental Education 30 

Journal of Experimental Psychology 2 

Journal of Genetic Psychology (Pedagogical 

Seminary; Pedagogical Seminary and Journal 
of Genetic Psychology) 37 

Journal of Psychology 1 

Journal of Social Psychology 1 

Mathematics Teacher 36 



I 



O 
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TABLE IX (continued) 

Nation's Schools 

National Educational Association Journal 
National Elementary Principal 



1 

2 

1 



f\ C Vk 1 



V/ A. A a. Vi^ 



i 



Peabody Journal of Education n 
Pittsburgh Schools 1 
Pittsburgh University School of Education 

Journal i 



Reading Teacher 1 

Scientific American 1 
School Board Journal \ 
School Executive 3 
School Review 3 
School Science and Mathematics 35 
School and Society 5 

Teachers College Record 4 
Theory int® Practice 1 
Training School Bulletin 1 

Wisconsin Journal of Education 4 
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TABLE X (continued) 
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1900-29 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 1 

25 

26 

27 

28 1 



FREQUENCY OF 
1930-50 



1 



2 

1 

2 

1 

1 

2 

1 



TABLE XI 



QUALITATIVE VALUE LY YEARS 



1951-65 



1900-29 



2 

6 

2 
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12 
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7 
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15 
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1 
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2 
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2 
1 
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3 
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APPENDIX Do Categories and Coding for Mathematical Topic 



Educational objectives and instructional procedures 

1) Historical development and procedures 

2) Values of arithmetic 

3) Planning and organizing for teaching (meaning approach j 
multi-graded; departmentalized, self-contained^ non-graded 
team teaching; modern, traditional; exposition, dis- 
covery; incidental, systematic; activity program; teach 
ing practices) 

4) Attitude and climate 

5) Specific procedures 

a) Drill and practice 

b) Problem solving 

c) Estimation 

d) Mental computation 

e) Homework 

f) Review 

g) Checking 

h) Writing and reading numerals 

6) Foreign comparisons 

Topical placement 

1) Pre-first-grade concepts 

2) Readiness 

3) Logical order 

4) Quantitative understanding 

5) Content to be included in grade 

6) Time allotment 

Basic concepts (and methods of teaching them) 

1) Counting 

2) Number properties and relations 

3) Whole numbers 

a) Addition 

b) Subtraction 

c) Multiplication 

d) Division 

4) Fractions 

a) Addition 

b) Subtraction 

c) Multiplication 

d) Division 

5) Decimals 

6) Percentage 

7) Ratio and proportion 

8) Measurement (time, denominate numbers) 

9) Negative numbers (integers) 

10) Algebra 

11) Geometry 
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12) Sets 

13) Logic 

14) Our numeration system 

15) Other numeration systems 

16) Probabilicy and statistics (graphing) 

do Materials 

1) Textbooks 

2) Workbooks 

3) Manipulative devices 

4) Audio“visual devices 

5) Programmed instruction 

6) Readability and vocabulary 

7) Quantitative concepts in other subject areas 

e. Individual differences 

1) Diagnosis (errors) 

2) Remediation (slow learner, underachiever) 

3) Enrichment (acceleration) 

4) Grouping procedures (ability, homogeneous, individualized, 
flexible) 

5) Physical, psychological, and/or social characteristics 

6) Sex differences 

7) Socio-economic differences 

fo Evaluating progress 

1) Testing 

2) Achievement evaluation 

3) Relation to achievement 

a) Age 

b) Intelligence 

4) Effect of parental knowledge 

5) Effect of teacher background 

go Studies related to learning theory 

1) Transfer 

2) Retention (retroactive inhibition) 

3) Generalization 

4) Organization (process, reasoning) 

5) Motivation 

6) Piagetian concepts 

7) Reinforcement (knowledge of results) 



er|c 
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APPENDIX Eo 
Descriptive; 

Survey; 



Case study? 
Action research 
Correlational; 
Ex post facto; 



Experimental; 



Categories and Coding for Type of Study 

research in which the researcher reports on 
records which may have been kept by someone 
else; includes reviews, historical studies, 
and textbook analyses or comparisons 

research which attempts to find characteris- 
tics of a population by asking a sample 
through the use of a questionnaire or inter- 
view; includes also the status study, in 
which a group is investigated as it is to 
ascertain pertinent characteristics (measures 
assigned variable only) 

research in which the researcher describes in 
depth what is happening to one designated 
unit, usually one child 

; research which uses nominal controls; gener- 
ally teacher or school originated; procedures 
of actual practice may be described 

research which studies relationships between 
or among two or more variables; uses corre- 
lational statistic primarily 

research in which the independent variable or 
variables were manipulated in the past; the 
researcher starts with the observation of a 
dependent variable or variables. He then 
studies the independent variables in retro- 
spect for their possible effects on the 
dependent variables. (He may examine inter- 
relationships of two or more assigned variables 
or two or more levels of one assigned variable) 

research in which the independent variable or 
variables are manipulated by the researcher to 
quantitatively measure their effect on some 
dependent variable or variables, to test a 
logically derived hypothesis 
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APPENDIX Fo Instrument for Evaluacing Experimental Research Reports 



Directions ; 

Evaluate with the nine underlined questions which follow o 
The quality of the research report in terms of each question should 

be rated on a five-point scale o The specifications for these five 
points ares 



1 ) 


Excellent j 


all requirements for the question are met 5 
nothing essential could be added 


2 ) 


Very goods 


most requirements are met 


3) 


Goods 


some requirements are met 


4) 


Fair: 


a few requirements are met 


5) 


Poors 


none or too few of the requirements are met 



Certain "key points" should be considered in ascertaining a 
rating for each question. These are listed below the question, 
followed by adjectives which indicate the continuum on which the 
^^key point should be assessed,, Do NOT make a response to these 
"key points." They are intended to focus the attention of all 
raters on the same pertinent aspects of each question. 

Please make only nine responses for each article, one for 
each question. 



Xns t rumen t for Evaluating Experimental Research Reports 



Marilyn N. Suydam 
The Pennsylvania State University 



How practically or theoretically significant is the problem? 
(1-2-3-4-5) 



a. Purpose 

b. Problem origin 

1) Rationale 

2) Previous research 



(important- --non- import ant) 

(logical- — illogical) 

( appropriate - - - inappr opr ia te) 



How clearly defined is the problem? (1-2-3-4-5) 



a. Question 

b. Hypothesis(es) 

c. Independent variable (s) 

d. Dependent variable(s) 



( ope r a t iona 1 vague) 

( re levant - - - ir r e levant) 
(logical-— illogical) 

(re levant irre levant) 

(operational-- -vague) 
(re levant irrelevant) 



How well does the d es ign answer the research question? 
(1-2-3-4-5) “ ~ 



a. Paradigm 

b. Hypjpthesis(es) 

c. Prb'cedures 

d. Treatments 

e. Duration 

How adequately does the design 

a. Independent variable (s) 

b. Administration of treatment 

c. Teacher or group factors 

d. Subject or experimenter bias 

e. Halo effect 

f. Extraneous factors 

g. Individual factors 



(appropriate--- inappropriate) 
(tes tab le---untes table) 

(clear unc lear) 

(replicable---unreplicab le) 
(appropriate— -inappropriate) 
(appropriate— -inappropriate) 

control variables ? (1-2-3-4-5) 

(uncon tamina ted-- -contaminated) 
( r igor ous — -unr igor ous) 
(controlled---uncontr oiled) 
(controlled- --uncontrolled) 
(controlled- --uncontrolled) 
(controlled-— uncontrolled) 

( con tro lled-**-uncon trolled) 



How properly is the sample selected for the des Ign and purpose 
of the research? (1-2-3-4-5) 



a. Population 

b. Drawing of sample 

c. Assignment of treatment 



( appr opr i a te - - - inappr opr i a t e) 
(random---unspecif ied) 
(random- — unspecified) 



( appr opr ia te inappropriate) 

( appr opr ia te inappr opr ia te) 

3.T& the measu ring instruments or observa- 
(1-2-3-4-5) 



d. Size 

e. Characteristics 

How valid and reli able 
tional techniques? 

a. Instrument or technique 

1) Description 

2) Validity 

3) Reliability for population 

b. Procedure of data collection 



(excel lent- --poor) 
(appropriate- --inappropriate) 

(excellent poor) 

(careful- — careless) 



are the techniques of analysis of data ? (1-2-3-4-5) 



How valid 



a. Statistical tests 

1) Basic assumptions 

2) Relation to design 

b. Data 

1) Treatment 

2) Presentation 

3) Level of significance 

4) Discussion 



( sat isf ied-— unclear) 
(appropriate- --inappropriate) 

( opr ia te inappr opr ia t e) 

( c lear — -unr. I ear) 

(appropriate -inappropriate) 

(specified unspecified) 

( accurate inaccurate) 



How appropriate are the interpretation s and generalizations from 
the data ? (1-2-3-4-5) 



a. Consistency with results (excellent-poor) 

b. Generalizations (reasonable---exaggerated) 

c. Implications (reasonable — -exaggerated) 

d. Limitations (noted---not noted) 

How adequately is the research reported? (1-2-3-4-5) 



a. Organization 

b. Style 

c. Grammar 

d. Completeness 



(excellent- - -poor) 
(clear---vague) 

(good poor) 

(excellent---poor) 
(replicable— -unreplicable) 



